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Abstract. A central method for analyzing the asymptotic complexity of a functional pro¬ 
gram is to extract and then solve a recurrence that expresses evaluation cost in terms of 
input size. The relevant notion of input size is often specific to a datatype, with measures 
including the length of a list, the maximum element in a list, and the height of a tree. In 
this work, we give a formal account of the extraction of cost and size recurrences from 
higher-order functional programs over inductive datatypes. Our approach allows a wide 
range of programmer-specified notions of size, and ensures that the extracted recurrences 
correctly predict evaluation cost. To extract a recurrence from a program, we first make 
costs explicit by applying a monadic translation from the source language to a complexity 
language, and then abstract datatype values as sizes. Size abstraction can be done seman¬ 
tically, working in models of the complexity language, or syntactically, by adding rules to 
a preorder judgement. We give several different models of the complexity language, which 
support different notions of size. Additionally, we prove by a logical relations argument 
that recurrences extracted by this process are upper bounds for evaluation cost; the proof 
is entirely syntactic and therefore applies to all of the models we consider. 


1. Introduction 

The typical method for analyzing the asymptotic complexity of a functional program is to 
extract a recurrence that relates the function’s running time to the size of the function’s 
input, and then solve the recurrence to obtain a closed form and big-0 bound. Automated 
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complexity analysis (see the related work in Section [7]) provides helpful information to pro¬ 
grammers, and could be particularly useful for giving feedback to students. In a setting with 
higher-order functions and programmer-defined datatypes, automating the extract-and-solve 
method requires a generalization of the standard theory of recurrences. This generalization 
must include a notion of recurrence for higher-order functions such as map and fold, as 
well as a general theory of what constitutes “the size of the input” for programmer-defined 
datatypes. 

One notion of recurr ence for higher-order func tions was developed in previous work by 


Danner and Rover 2009l | and Danner et al. |201dl |. Because the output of one function is 


the input to another, it is necessary to extract from a function not only a recurrence for 
the running time, but also a recurrence for the size of the output. These can be packaged 
together as a single recurrence that, given the size of the input, produces a pair consisting of 
the running time (called the cost) and the size of the output (called the potential). Whereas 
the former is the cost of executing the program to a value, the latter determines the cost 
of using that value. This generalizes naturally to higher-order functions: a recurrence for a 
higher-order function is itself a higher-order function, which expresses the cost and potential 
of the result in terms of a given recurrence for the cost and potential of the argument function. 
The process of extracting recurrences can thus be seen as a denotational semantics of the 
program, where a function is interpreted as a function from input potential to cost and 
output potential. 

Building on this work, we give a formal account of the extraction of recurrences from 
higher-order functional programs over inductive datatypes, focusing how to soundly allow 
programmer-specified sizes of datatypes. We show that under some mild conditions on sizes, 
the cost predicted by an extracted recurrence is in fact an upper bound on the number of 
steps the program takes to evaluate. The size of a value can be taken to be (essentially) the 
value itself, in which case one gets exact bounds but must reason about all the details of 
program evaluation, or the size of a value can forget information (e.g. abstracting a list as 
its length), in which case one gets weaker bounds with more traditional reasoning. 

We start from a call-by-value source language, defined in Section [2l with strictly positive 
inductive datatype definitions (which include lists and finitely branching trees, as well as 
infinitely branching trees). Datatypes are use d via case-analysis and structural recursion 
(so the language is terminating), but unlike in Danner et al.l 20131, recursive calls are only 
evaluated if necessary—for example, recurring on one branch of a tree has different cost 
than recurring on both branches. The cost of a program is defined by a standard opera¬ 
tional cost semantics, an evaluation relation annotated with costs. For simplicity, the cost 
semantics measures only the number of function applications and recursive calls made during 
evaluation, but our approach to extracting recurrences generalizes to other cost models. 

We extract a recurrence from such a program in two steps. First, in Section [3l we make 
the cost of evaluating a program explicit, by translating a source program e to a program ||e|| 
in a complexity language. The complexity language has an additional type C for costs, and 
the translation to the complexity language is a c all-by-value monadic translation into the 


writer monad C x — Moggil . 199ll . Wadleii . 19921. The translated program ||e|| returns an 


additional result, which is the cost of running the original program e. 

Second, we abstract values to sizes; we study both semantic and syntactic approaches. 
In Section 01 we give a size-based semantics of the complexity language, which relies on 
programmer-specified size functions mapping each datatype to the natural numbers (or 
some other preorder). Typical size functions include the length of a list and the size or 
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depth of a tree. The semantics satisfies a bounding theorem fTheorem I4.2p . which implies 
that the denotational cost given by composing the source-to-complexity translation with the 
size-based semantics is in fact an upper bound on the operational cost. We show on some 
examples that the recurrence or co st extracted by this process is the expected one; we also 


will later show that all examples in Danner et al. 2013| 


carry over. 


Alternatively, the abstraction of values to sizes can be done syntactically in the com¬ 
plexity language, by imposing a preorder structure on the values of the datatype themselves. 
For example, rather than mapping lists to numbers representing their lengths, we can order 
the list values by rules including xs < {x::xs) and {x::xs) < (y::xs). The second rule says 
that the elements of the list are irrelevant, quotienting the lists down to natural numbers, 
and the first generates the usual order on natural numbers. Formally, we equip the com¬ 
plexity language with a judgement E < E' that can be used to make such abstractions. In 
Section [5l we identify properties of this judgement that are sufficient to prove a syntactic 
bounding theorem (Theorem 15.7j) . which states that the operational cost is bounded by the 
cost component of the complexity translation. The key technical notion is a logical rela- 
tion between the so urce and complexity languages that extends the bounding relation of 


Danner et al.l |2013l | to inductive types. This proof gives a bounding theorem for any model 


of the complexity language that validates the rules for <. In Section |6l we show that these 
rules are valid in the size-based semantics of Section |4] (thereby proving Theorem 14.2j) , and 
we discuss several other models of the complexity language. 

This gives a formal account of what it means to extract a recurrence from higher-order 
programs on inductive data. We leave an investigation of what it means to solve these 
higher-order recurrences to future work. 


2. Source Language with Inductive Data Types 

The source language is a simply-typed A-calculus with product types, function types, sus¬ 
pensions, and strictly positive inductive datatypes. Its syntax, typing, and operational 
semantics are given in Figure [2j We bundle sums and inductive types together as datatypes, 
rather than using separate -|- and n types, because below we do not want to consider sizes 
for the sum part separately. 

We assume a top-level signature tp consisting of datatype declarations of the form 
datatype <5 = Cg of pCoiS] I • • • I of 

Each constructor’s argument type is specified by a strictly positive functor cj). These include 
the identity functor (t), representing a recursive occurrence of the datatype; constant func¬ 
tors (r), representing a non-recursive argument; product functors {(pi x (j) 2 ), representing a 
pair of arguments; and constant exponentials (r —>■ (p), representing an argument of func¬ 
tion type. We write (p[T/t] or just (P[t] for substitution of the type r for the single free type 
variable t in cp. We frequently drop the indexing superscripts, write datatype 5 = C of (pc, 
and write C rather than Ci to refer to one of the constructors of the declaration. In the 
signature, each (pc in each datatype declaration must refer only to datatypes that are de¬ 
clared earlier in the sequence, to avoid introducing general recursive datatypes. We write 
C : {(p ^ 6) £ Ip to mean that the signature ip contains a datatype declaration of the 
form datatype 6 = ... | C of (p[6] | .... The formal definitions of signatures, types, and 
constructor arguments are given in Figure [TJ 
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Signatures: if) sig. 


5 ^ Ip VC {ip h (pc ok) 

() sig i/;, datatype 5 = C of (pc[S] sig 

Types: i/; hr type. 


Ip \- tq type Ip \- Ti type ip \- tq type ip \- ti type 
Ip h unit type ip \- tq x ti type ip tq ^ Ti type 

ip\- T type 6 (z Ip 

Ip h susp r type ip 5 type 

Constructor arguments: ip h (/)ok. 

ip\- T type Ip \- (pQok. ip\- (p\6k. ip \- t type ip \- cpok 
Ip to'k Ip \- T ok Ip (pQ X (pi ok Ip \- T ^ (p ok 

Figure 1. Valid signatures, types, and constructor arguments. 


We define the expressions e and typing judgment 7 h e : r in Figure [2j As we will do 
in most of the rest of the paper, here we elide reference to the signature and just refer to 
types and constructor arguments. On the occasion when precision is crucial, we notate the 
typing judgment with the signature, as in 7 e : r. 

Evaluation (defined in Figure [3]) is call-by-value and products and datatypes are strict. 
However, unfolding datatype recursors requires substituting expressions (the recursor ap¬ 
plied to the components of the value) for the variables standing for the recursive calls— 
running the recursive call first and substituting its value would require a function to make 
all possible recursive calls. We handle this using suspensions: when computing a r by re¬ 
cursion, the result of a recursive call is given the type susp r. The values of type susp r 
are delay(e) where e is an expression of type r; the elimination form force forces evalua¬ 
tion. In general, when defining a recursive computation of result type r, the branch for a 
constructor C, ec, has access to a variable of type (pc[S x susp r], which gives access both 
to the “predecessor” values of type 6 and to the recursive results. This recursor supports 
both case-analysis and structural recursion, and recursive calls are only computed if they 
are used. 

For any strictly positive functor (p, the map*^ expression witnesses functoriality, essentially 
lifting a function tq —>■ ti to a function (P[tq] —>■ (P[ti\. It is used i n the operation al semantics 
for the recursor to insert recursive calls at the right places in (p ( Harper 2013l | provides an 
exposition). We will only need to lift maps x : tq.v : ti whose bodies are syntactic values 
(or variables), and apply them to syntactic values (or variables), and we restrict map to this 
special case to simplify its cost semantics. 
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Types: 


r ::= unit | r x r | r —>■ r | susp r | S 
4 > :;= t|r|(/>X(/)|r—>■(;/) 

datatype 5 = of (t)c^[5] | ... | of [<5] 

Expressions: 

V ::= x I () I (f,u) I \x.e \ delay(e) | Cv 
e ::= x | () | (e,e) | split(e,x.x.e) | \x.e \ ee 
I delay(e) | force(e) 

\C^ e\ rec^{e,C i-A x.ec) 

I map'^(a;.u, u) | let(e,x.e) 
n ::= 0 | 1 | n + n 

Typing: 7 h e : r. 


'y,x : a \- X : a 

_ 7 H ep : To 7 h ei : Ti 7 h ep : tq x n 7, xp : tq, xi : n h d : r 

7l-():unit 7 h (ep,ei) : rp x n 7 h split(ep,xp.xi.ei) : r 

7,x:(Tl-e:r 7l-ep:cj—)-r 71-ei :it 

7 h Xx.e : a ^ T 7 h ep ei : r 

7 h e : r 7 h e : susp r 

7 h delay(e) : susp r 7 h f orce(e) : r 
e : 4 )c[S\ 7 h e : 5 VC (7, x : 4 )c [5 x susp r] h ec : t) 

7 H C^ e : 5 7 h rec'^(e, C i-A x.ec) ^ t 

7, X : Tp h xi : Ti 7 h xp : ())[rp] 7 h ep : cr 7,x : c h ei : r 

7 h map'^(x.xi, Xp) : 1 let(ep,x.ei) : r 

Figure 2. Source language syntax and typing. 


A couple of examples may be more edifying than the formalism. In these examples 
and the future, we use a sugared syntax of pattern variables for the constructor argu¬ 
ments. So in our first example we write zec{...,N of (n, (tp, rp), (ti, ri)).eAr), where 
ev = eAr(n, fp, rp, ti, ri) as syntactic sugar for rec(..., of x.e)y), where 

e^Y = split(x, n.y.split(y, u.x.split(u, fp.rp.split(x, ti.ri.ejsf)))). 

As a first example, consider the type of int-labeled binary trees: 

datatype tree = E of unit | N of int x tree x tree 

Now consider a recursive definition rec{N{n,to,ti), E x.ce, N x.ejsf). For the N- 
clause, X : int x (tree x susp r) x (tree x susp r). Thus the evaluation must substitute 
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Operational semantics: e v. 

ep 4 ,"^° vq ei vi ep {vo-,vi) ei[vo/xo,vi/xi] v 

(eo,ei) l^o+ni split(eo,xo.xi.ei) \.'^o+ni ^ 

ep 4 ,"'° Xx.e'g ei vi eQ[ni/x] 4-” v 
ep ei l^o+ni+n ^ 

e delay(ep) ep v 
delay(e) 4-° delay(e) force(e) 4,'^o+"^i v 

e i"- V e 4-”° Cvo map'^'^(y.(y, delay(rec(y, O i-A x.ec'))),fo) 4-"'^ vi ec'[r’i/a:] v 

Ce 4-"' Cv rec(e, C* i-A x.ec) 4 ,^+”o+ni+n 2 ^ 


inap*(x.n, np) 4-° f[i;p/x] map'^(a;.'u,-up) 4-° fp 
niap'?^o(x.i;,i;p) v'^ map'll (x.n, i;i) 

map'^ox,i.i(a;.^;^ (i;p,i;i)) 


(t not free in r) 


inap'^~^'^(x.n, Ay.e) 4.° Ay. let(e, 2;.map‘^(x.i;, z)) 

ep ep ei[eo/x] 4,^1 r; 
let(ep,x.ei) 4 ,’^o+"'i v 

Figure 3. Source language operational semantics. 


(n, (tp, rp), (ti, ri)) for x in e, where r* is the result of the recursive call on the subtree ti. In 
this case, to evaluate rec(Ai(n, tp, ti),...), we set e = delay(rec(x,...)) and compute 

inap^'^^^*^*(x.(x, e), {n,to,ti)) = (map^“^(x.(x, e), n),map*(x.(x, e), tp),inap*(x.(x, e),ti)) 

= (n, (fp,e[tp/x]), (ti,e[ti/x])) 
and substitute the result for x in ejsf. 

As a second example, consider the type of infinite, infinitely-branching int-labeled trees: 

datatype tree^ = of int x (nat —)■ tree^). 

Now consider the evaluation of rec(AI(n, Ay.ep), N 1 —>■ {z, f)-ei\f) where (z, f): int x (nat —> 
(tree' x suspr)). Set e = delay(rec(x, iV 1 —>■ {z, f).ei\f)), compute 

map^''^'^*'”^^^*^(x.(x, e), (n, Ay.ep)) = (inap^''^(x.(x, e), n),map“^^^*(x.(x, e), Ay.ep)) 

= (n. Ay. let(ep, t.map*(x.(x, e), z))} 

= (n. Ay. let(ep, z.(z, e[z/x\))) 

= (n, Ay.(ep,e[ep/x])). 

Now subsitute the result for (t,/) in cat. Presumably cat has a subexpression of the 
form f q : tree' x susp r. This last substitution has the moral effect of replacing f q with 
(ep(g),e[ep(g)/x]); the first component is the subtree, and the second is the result of the 
recursive call at that subtree. 
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The cost semantics in Figure [3] defines the relation e v, which means that the expres¬ 
sion e evaluates to the value u in n steps. Our cost model charges only for the number of 
function applications and recursive calls made by datatype recursors. This prevents constant¬ 
time overheads from the encoding of datatypes using product and suspension types from 
showing up in the extracted recurrences. It is simple to adapt the denotational cost seman¬ 
tics below to other operational cost semantics, such as one that charges for these steps, or 
assigns different costs to different constructs. 

Substitutions are dehned as usual: 

Definition 2.1. We write 9 for substitutions vijxi^... ,VnlXn, and 0 : 7 to mean that 
Dom 6 C Dom 7 and 0 h 9{x) : 7 (x) for all x G Dom 9. We define the application of a 
substitution 9 to an expression e as usual and denote it e[9]. 

Lemma 2.2. If x does not occur in 9, then e[9, x / x\[ei / x\ = e\9,ei/x\. 

For source cost expressions n, we write n < n' for the order given by interpreting these 
cost expressions as natural numbers (i.e. the free precongruence generated by the monoid 
equations for (-|-,0) and 0 < 1). We have the following syntactic properties of evaluation: 

Lemma 2.3 (Value Evaluation). 

• If V v' then n < 0 and v = v'. 

• For all V, v v. 

Lemma 2.4 (Totality of map). //7 F map'^(x.ui, uq) : <t>[Ti] then map'^(x.ui, uq) 4-*^ v for 
some V. 


3. Making Costs Explicit 


3.1. The Comp l exity Language. The complexity language will serve as a monadic meta¬ 
language Moggil . Il99l| in which we make evaluation cost explicit. The syntax and typing 
are given in EigureUl The preorder judgement defined in Section [5] will play a role analogous 
to an operational or equational semantics for the complexity language. 

Because we are not concerned with the evaluation steps of the complexity language 
itself, we remove features of the source language that were used to control evaluation costs. 
Product types are eliminated by projections, rather than split. We allow substitution of 
arbitrary expressions for variables, which is used in recursors for datatypes. Consequently, 
suspensions are not necessary. We treat map'*’(a:.E, Ei) as an admissible rule (macro), de¬ 
fined by induction on 


r,x:TohEi:ri F h Ep : ^[Tp] 

Fh map^(x.Ei,Eo) :$[Ti] 
map*(x.£', Eq) := E[Eq/x\ 
map^(x.E, Eq) := Eq 

map^°^'*’^(x.E, Eq) := (map*°(x.E, ttoEq), map^^(x.E, ttiEq)) 
map^“^'*’(x.E, El) := \y. map'*’(x.E, Ei y) 

The type C represents some domain of costs. The term constructors for C say only 
that it is a monoid (-|-,0) with a value 1 representing the cost of a single step. Costs can 
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Types: 


T ::= C I unit I A I T X T I 
datatype A = of ^>00 [A] | ... | of $c„_i[A] 


Expressions: 

E ::= X I 0 I 1 I + S I 

() I {E,E) {ttqE \ttiE \ Xx.E \ EE 
\C^E\ rec^{E,C ^ x.Ec) 

Typing: T \- E :T. 


T,x :T \- X :T 

Th Eo:C Th El :C 

rhO:C T h 1 : C T h Eo + Ei : C 

_ T h gp : To T h Eii : Ti T h Ei: Tp x Ti 

T h () : unit T h {Eq, Ei) : rp x ti T h VTj Ei: T* * 

T, X : Tp h E; : Ti T h Ep : Tp ^ Ti T h Si : Tp 

T h Xx.E iTo^Ti T h Sp Si : Ti 

T h S : 

ThC^E:A 

T h S : A VC (T, X : $c[A x S] E Sc : T) 

T h rec^(S,C^x.Sc) ; T 


Figure 4. Complexity language types, expressions, and typing. 


be interpreted in a variety of ways—e.g. as natural numbers and as natural numbers with 
infinity (Section 0]). 

Substitutions 0 in the complexity language are defined as usual, and satisfy standard 
composition properties: 

Lemma 3.1. 

• If X does not occur in 0, then S[0, x/x] [Si/x] = S[0,Si/x]. 

• //xi,X 2 do not occur in 0, then S[Si/xi][S2/x2][0] = S[0,Si[0]/xi,S2[0]/x2]. 
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3.2. The Complexity Translation. A notion of complexity that considers only cost is 
insufficient for handling higher-order functions such as 
listmap = X{f,xs).rec(xs, 

Nil hA Nil 

Icons i-A {y, (?/s,r)).Cons(/?/,force(r))) 

The cost of listmap(/, xs) depends on the cost of evaluating / on the elements of xs, and 
hence (indirectly) on the sizes of the elements of xs. And since listmap(/, xs) might itself 
be an argnment to another function (e.g. another listmap), we also need to predict the sizes 
of the elements of listmap(/, xs), which depends on the size of the ontput of /. Thns, to 
analyze listmap, we should be given a recurrence for the cost and size of /(x) in terms of the 
size of X, from which we prodnce a recnrrence that gives the cost and size of listmap(/, xs) 
in terms of the size of xs. We call the size of the value of an expression that expression’s 
potential., because the size of the value determines what future uses of that value will cost0 

This discussion motivates translations ((•)) from source language types to complexity 
types and H-H from source language terms to complexity language terms so that if e; r, then 
||e|| : C X ((t)). In the complexity language, we call an an expression of type ((r)) a potential, 
an expression of type C a cost, and expression of type C x ((r)) a complexity. We abbreviate 
C X ((r)) by ||r||. The first component of ||e|| is the cost of evaluating e, and the second 
component of ||e|| is the potential of e. 

To gain some intuition for the full definition of potential, we first consider the type-level 0 
and 1 cases. At type-level 0, the potential cost of an expression is a measure of the size of 
that expression’s value; it is the size of the value that determines the cost the expression 
contributes to the cost of future computations. Now consider a type-level 1 expression cq. 
The use of cq is its application to a type-level 0 expression ei. The cost of such an application 
is the sum of (i) the cost of evaluating cq to a value Ax.Cq; (ii) the cost of evaluating ei to a 
value vi] (hi) the cost of evaluating CqIui/x]; and (iv) a possible charge for the /3-reduction. 
Since (iii) depends in part on the size of xi (i.e., the potential of ei), by compositionality 
complexities must capture both cost and potential. Furthermore, (iii) is defined in terms 
of the potential of eg (i.e., the potential of Ax.Cq). Thus the potential of a type-level 1 
expression should be a map from type-level 0 potentials to type-level 0 complexities. 

With this in mind, consider (the type of) listmap. Its potential should describe what 
future uses of listmap will cost, in terms of the potentials of its arguments. For the type 
of listmap (uncurried), the above discussion suggests that (((r —>■ cr) x (r list) —)• a list)) 
ought to be (((t)) —)• C x ((ct))) x ((rlist)) —)• C x ((crlist)). For the argument function, we 
are provided a recurrence that maps r-potentials to cr-complexities. For the argument list, 
we are provided a r list-potential. Using these, the potential of listmap must give the cost 
for doing the whole map and give a a list-potential for the value. This illustrates how the 
potential of a higher-order function is itself a higher-order function. 

As discussed above, we stage the extraction of a recurrence, and in the first phase, we 
do not abstract values as sizes (e.g. we do not replace a list by its length). Because of this, 
the co mplexity t ransl ation has a succinct description. For any monoid (C, -|-,0), the writer 
monad Wadler . 1992l | C X — is a monad with 

return(i 3 ) := { 0 ,E) 

Ei»=E 2 := {Tro{Ei) + Tro{E2{Tri{Ei))),TTi{E2{Tri{Ei)))) 


^ Use cost would be another reasonable term for potential. 
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The monad laws follow from the monoid laws for C. Thinking of C as costs, these say 
that the cost of return(e) is zero, and that the cost of bind is the sum of the cost of Ei 
and the cost of E2 on the potential of Ei. The complexity translation is then a call-by- 
value monadic translation from the source language into the writer monad in the complexity 
language, where source expressions that cost a step have the “effect” of incrementing the 
cost component, using the monad operation 

incr(£' : C) : C X unit := {E, {)) 

We write this translation out explicitly in Figured When E is a complexity, we write 
Ec and Ep for tto E and tti E respectively (for “cost” and “potential”). We will often need to 
“add cost” to a complexity; when Ei is a cost and E2 a complexity, we write Ei +cE2 for the 
complexity {Ei + {E 2 )c, {E 2 )p) (in monadic notation, incr(i?i) » E 2 ). The type translation is 
extended pointwise to contexts, so x : r G 7 iff x:{{t)) G ((7)) —the translation is call-by-value, 
so variables range over potentials, not complexities. For example, ||x|| = (0,x), where the x 
on the left is a source variable and the x on the right is a potential variable. Likewise we 
assume that for every datatype 6 in the source signature, we have a corresponding datatype 6 
declared in the complexity language. 

We note some basic facts about the translation: the type translation commutes with the 
application of a strictly positive functor, which is used to show that the translation preserves 
types. 

Lemma 3.2 (Compositionality). 

• ll</>MII -||<^||[((r))] 

• im)) - mmr))] 

Theorem 3.3. If j e : t, then || 7 || l-||.,^|| ||e|| : ||r||. 


4. A Size-Based Complexity Semantics 


In the above translation, the potential of a value has just as much information as that value 
itself. Next, we investigate how to abstract values to sizes, such as replacing a list by its 
length. In this section, we make this replacement by dehning a size-based denotational 
semantics of the complexity language. 

We need to be able to treat potentials of inductively-dehned data in two different ways. 
On the one hand, potentials must reflect intuitions about sizes. To that end, we will insist 
that potentials be partial orders. On the other hand, to interpret rec expressions, we must 
be able to distinguish the datatype constructor that a potential represents. In other words, 
we need the potentials to also be (something like) inductive data types. We w ill have our 
cake and eat it too using an approach similar to the work on views |Wadleii . Il987l | . As hinted 
above, we interpret each datatype A in the complexity language as a partial order |A]. But 
we will also make use of the sum type D‘^ = |<h( 7 Q[A]] [^]1 (representing the 

unfolding of the datatype) and a function size^ : |A] (which represents the size of 

a constructor, in terms of the size of the argument to the constructor). When = t (i-e. 
the argument to the constructor is a single recursive occurrence of the datatype), size{inj^ x) 
is intended to represent an upper bound on the size of the values of the form C v, where 
u is a value of size at most x. To dehne the semantics of rec^{y,C 1 —>■ x.Ec), we consider 
all values z G such that size^iz) < y. We can distinguish between such values to 
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r 

= Cx((r)) 

((unit)) 

= unit 

{{a X r)) 

= ((^))x((r)) 

{{a ^ r)) 


((susp r)) 

= |r 

((<^)) 

= (5 


= Cx ((</.)) 

m) 

= t 

Hr)) 

= Hr)) 

o 

X 

= ((4)) X ((<^i)) 

((r ^ (/>)) 

= ((r))^||<^|| 


{{ip)) has, for each datatype 5 in ip 

datatype (5 = of ((</>Co))[<^], • • •, ((</’n-i))[5] 


1 X 

= (0,x) 

1101 

= (0,0) 

ll(eo,ei)| 

= (l|eo||c + ||ei|0(||eo|0l|eiy) 

||split(eo,xo.a:i.ei)| 

= ||eo||c+c ||ei||[7ro||eo||p/xo,7ri||ei||p/xi] 

Ax.e 

= (0, Xx.e) 

lleoeil 

= (1 + (eo)c + (ei)c) +c (eo)p(ei)p 

||delay(e)| 

= (0,||e||) 

force(e) 

= Ikllc + c ||e||p 

rfel 

= (l|e||c,C0|e||p) 

rec^(e, C i-A x.ec)\ 

= ||e||c +c rec'^(||e||p, C i-A x.l +c ||ec||) 

||map'^(x.i;o,fi)| 

= (0,map<<'^02;-IOo||p, lOilIp)) 

||let(eo,x.ei)| 

= l|eo||c+c||ei||[||eoyx] 


Figure 5. Translation from source types and expressions to complexity 
types and expressions. Recall that ||e||c = ToUeH and ||e||p = 7ri||e||. 


(recursively) compute the possible values of the form Ec[- ■ -/x], and then take a maximum 
over all such values. 
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For example, for the inductive definitions of nat and list (where the list elements have 
type nat), suppose we want to construe the size of a list to be the number of all nat and 
list constructors. We implement this in the complexity semantics as 


|natl 

^nat 

Sitenat(*) 

sizCnatim) 


Z+ 

{*} + [natl 
1 

1 + m 


llistl 

D\<st 


sizeusti*) 

sizeust{{m,n)) 


z+ 

{*} + (|natl X IlistJ) 
1 

1 + m + n 


where Z~*“ is the non-negative integersJl 

We define the size-based complexity semantics as follows. The base cases for an inductive 
definition of {S'^, <t) for every complexity type T consist of well-founded partial orders 
(5 ^,<a) for every datatype A in the signature, such that <a is closed under arbitrary 
maximums (see below for a discussion). We dehne = N U {oo}, where N is the natural 
numbers with the usual order and addition. We extend the order and addition to oo by 
n <N°° oo and n-|-oo = oo-|-n = oo-|-oo = oo for all n E N. For products and functions 
we define = {*} and ^TjxTi _ gTo ^ gTi gTo^Ti _ with the trivial, 

componentwise, and pointwise partial orders, respectively. Complexity types are interpreted 
into this type structure by setting |C]] = N°° and |T]] = S'^ for each complexity type T. 

Stating the conditions on programmer-defined size functions requires some auxiliary 
notions. For datatype A = C of ^Ci set = |{<hc'o[A]]] writing 

mjj : |^>c'JA]] —for the injection. Next, we define a function sz^ with domain 
[[<1>[A]] (the semantic analogue of the argument type of a datatype constructor). sz^{a) is 
intended to be the maximum of the values of type [AJ from which a is built using pairing and 
function application. We want to define sz^ by induction on d>, computing the maximum 
at each step. To ignore values not of type [A] we assume an element T ^ that serves 
as an identity for V; that is, we order U {T} so that T < a for all a E . We define 
sz^ : [[<h[A]] —>■ U {T} by induction on as follows: 


sz^{a) = a 
sz'^{a) = T 

_ 52*^°(a) V S2;^i(a) 
sz'^^'^if) = VaGpI 

The key input to the size-based semantics is programmer-supplied size functions size a ■ 
—)■ S‘^ such that 

sz’^^i{a) <sAu|_l} {size a o inji){a) 

size A represents the programmer’s notion of size for inductively-defined values. The only 
condition, which is used to interpret the recursor, is that the size of a value is strictly greater 
than the size of any of its substructures of the same type. For example, this condition permits 
interpreting the size of a list as its length or its total number of constructors, and the size 
of a tree as its number of nodes or its height. Non-examples include defining the size of a 


^We refer to rather than the natural numbers to emphasize that the intepretation of <5 need not be 
an inductive datatype. 
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case^ : 

c 

case(Cx, , fc, ■ ■ ■)) = fc{x) 

ICejC = size{C{le}i)) 

lr&c\E\C ^ y case{z,{...,fc,...)) 

size 

where 

fc{x) = lEcj^{x^ |map‘^<^('u;.(i/;,rec(i/;,C'i-A a:.^c))K,ic)} 

= 1-^ map^^{Aa.{a, |rec(r(;, C i-A x.Ec)}^{w i-A a}), x)} 


Figure 6. The interpretation of rec in the size-based semantics for the com¬ 
plexity language. 


list of natural numbers to be the number of successor constructors, and dehning the size of 
all natural numbers to be a constant (though see Section 16.51 for a discussion of this latter 
possibility). 

The interpretation of most terms is standard except for that of constructors and rec, 
which are given in Figure [6l We write for semantic functions that mirror the 

definition of map, and we overload the notation Ci to stand for mjj : —)> D^. The 

implementation of the recursors requires a bit of explanation, and is motivated by the goal 
to have ||e|| bound the cost and potential of e. We expect that [||rec'^(e, C* i—>■ x.ec)||], 
which depends on |rec'^(||e||p, C i->-x.HeclDl, should branch on |||e||p]], evaluating to the 
appropriate IHecH]]. However, |||e||p]] will be a semantic value of type , whereas to branch, 
we need a semantic value of type . Furthermore, [||e||p] is only an upper bound on the 
size of e, so we cannot use [||e||p]] to predict which branch the evaluation of the source rec 
expression will follow. We solve these problems by introducing a semantic case function, 
and define the denotation of rec expressions by taking a maximum over the branches for 
all semantic values that are bounded by the upper bound |||e||p]]. This is the source of the 
requirement that base-type potentials be closed under arbitrary maximums. Although this 
requirement seems rather strong, in most examples it seems easy to satisfy. In particular, 
we think of most datatype potentials (sizes) as being natural numbers, and so we satisfy the 
condition by interpreting them by N°°. 

The restriction on size/\ ensures that the recursion used to interpret rec expressions 
descends along a well-founded partial order, and hence is well-defined. The maximum may 
end up being a maximum over all possible values, but this simply indicates that our inter¬ 
pretation fails to give us precise information. 

We illustrate this semantics on some examples. In order to ease the notation, we will 
occasionally write syntactic expressions for the corresponding semantic values (in effect, 
dropping [[•]]). We also write the case function as a branch on constructors; for example, we 
write case{t, Emp i-)- x.(l, 1) | Node i-)- (y, to, ti).e) for case{t, Ax.(l, 1), A(y, to,ti).e). 
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4.1. Booleans and Conditionals. In the source language we define booleans and their 
case construct: 


datatype bool = True of unit | False of unit 

case(e'’°°^, eg, e\) = rec(e, True i—>■ eg | False i—)> ei) 

(recall our convention on writing ec for x.ec when (/ic = unit). In the semantics of the 
complexity language, we interpret bool as a one-element set {1}, so True and False are 
indistinguishable by “size.” Our interpretation yields 


|||case(e,eg,ei)||]l = l|e||c +c rec(||e||p. True H- l-bc ||eo|| | False i-A 1 +c ||ei||) 

= ||e||c+c \/ case(6,True i-A 1+c ||eo|| I False i-A 1-be ||ei||) 

size ?><||e||p 

= ||e||c +c (case(True,True i-A 1 -be ||eg|| | False i-> 1 -be ||ei||) 

V case(False,True i-> 1 -be ||eo|| | False i-A 1 -be ||ei||)) 

= (1 + ||e||c) +c (||eo|| V ||ei||). 


In other words, if we cannot distinguish between True and False by size, then the inter¬ 
pretation of a conditional is just the maximum of its branches ( with the additional cost of 


evaluating the test). This is precisely the interpretation used bv IPanner et al. 2013l | 


4.2. Tree Membership. Next we consider an example that shows that the “big” maximum 
used to interpret the recursor can typically be simplified to the recurrence that one expects 
to see. We analyze the cost of checking membership in an int-labeled tree. We write 
eg orelseei as an abbreviation for case (eg, True i—> True | False i—ei). 

datatype tree = Emp of unit | Node of int x tree x tree 
mem(f, x) = rec{t, 

Emp i-A False 

Node i-A {y, (to,ro}, (ti,n)}. 

y = X orelse(force rg orelse force ri)) 

For this example, we treat int (in the source and complexity languages) as a datatype 
with 2 ^^ constructors where the equality test x = y is implemented by a rather large case 
analysis. Let us define the size of a tree to be the number of nodes: 

Itreel = N'^ 

Ptree ^ x N°° X N°° 

sftetree(Emp) = 0 

sizetree(Node(l, ng, ni)) = 1-b ng-b ni 

We would like to get the following recurrence for the cost of the rec expression when t has 
size n: 

r(0) = l T{n)= V 6 + T(no) + T(ni) 

ri.o+rii+l=n 

{x = y requires an application and two case evaluations; each orelse evaluation costs 1 ; 
and we charge for the rec reduction). 

Working through the interpretation yields |||mem(t, x)||]c = ||t||c + S'dl^llp) + 1 where 
g{n) = |rec(z, Emp i-> 1 Node {y, {to,ro), (ti,ri)).6 -b (rg)c -b (n)cl{^ n}. 
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We can calculate that g{0) = 1, and for n > 0: 

9{n) = y case{t, 

size t<n 

Emp i-A 1 

Node hA {y, no, ni ).6 + g(no) + g(ni) 

= g{n — 1) V \J case{t,...) 

size t=n 

= g'(n — 1) V \J case(Node(l, no, ni),...) 

l-\-nQ-\-ni=n 

= g{n-l)V y {6 + g{no) + g{ni)) 

l-\-no-\-ni=n 

We now notice that when we take no = 0 and ni = n — 1 we have 

6 + g{no) + g{ni) = 6 + ^(O) + g{n - 1) > g{n - 1) 

and hence 

g{n)=g{n-l)y \/ {6 + g{no) + g{ni)) 

l+no+ni 

= V {^ + g{no) + 9{ni)) 

l+no+ni 

which is precisely the recurrence we would expect. 

4.3. Tree Map. Next, we consider an example that illustrates reasoning about higher-order 
functions and the benefits of choosing an appropriate notion of size. We analyze the cost of 
the map function for nat-labeled binary trees: 

treemap(/, t) = rec(t, 

Emp i-A Emp 

Node i-A {y, {to,ro), {ti,ri)). 

Node{f{y), force tq, force ri). 

Suppose the cost of evaluating / is monotone with respect to the size of its argument, where 
we define the size of a natural number n to be 1 -|- n (to count the zero constructor). The 
cost of evaluating treemap(/, t) should be bounded by 1 -|- n • (1 -|- f{s)c), where n is the 
number of nodes in t, s is the maximum size of all labels in t, and we write /(s)c for the 
cost of evaluating / on a natural number of size s (the map runs / on an input of size at 
most s for each of the n nodes, and takes an additional n steps to traverse the tree). 

We take [tree]] = x N°°, where we think of the pair (n, s) as (number of nodes, 
maximum size of label), and use the mutual ordering on pairs ((n,s) < (n',s') iS n < n' 
and s < s' or n < n' and s < s'). The size function is defined as follows: 

size(Emp) = (0,0) 

size(Node(n, (no, sq), (u-i, si))) = (1 -|-no + ni, max{n, sq, si}). 
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Let US write g{m, s) = |||rec(. • • 'S)}) so that (|||treeinap||]](/, (m, s)))c = g{m, s) + 

1. We now show that g{m, s) < m(l + /(s)c) by induction: 

g{m,s) 

= \J case{z, 

size z<{m,s) 

Emp i-> 1 

Node i-A (n, (no, sq), (ni, sq)). 

(l + (/(n))c + {g{no, so))c + (^(ri, 'Si))c) 

= 1V y {l +fin)c +{g{no,so))c +{g{ni,si))c) 

l-\-no+ni<m 

ma,^{n,so,si}<s 

< V (1 +/Wc+ R 0 ■ (1 +/(so)c) + • (1 +/(si)c)) 

l+no+ni<m 

max{n,so,si}<s 

< \J (1+ no + ni)(l+ /(max{n, sojSiDc) 

l+no+ni<m 

max{n,so,si}<s 

< m ■ (1 + /(s)c). 


4.4. The Bounding Theorem for the Size-Based Semantics. The most basic correct¬ 
ness criterion for our technique is that a closed source program’s operational cost is bounded 
by the cost component of the denotation of its complexity translation. However, to know 
that extracted recurrences are correct, it is not enough to consider closed programs; we also 
need to know that the potential of a function bounds that function’s operational cost on all 
arguments, and so on at higher type. Thus, we use a logical relation. We first show a sim¬ 
plified case of the logical relation, where for this subsection only we do not allow datatype 
constructors to take functions as arguments (i.e., drop the r —>■ </> clause from constructor ar¬ 
gument types (/>). In Section [5l we consider the general case, which requires some non-trivial 
technical additions to the main definition. 

Definition 4.1 (Bounding relation). 

(1) Let e be a closed source language expression and a a semantic value. We write e R 
to mean: if e J,"' v, then 

(a) n < Uc', and 

(b) V cr' «p- 

(2) Let u be a source language value and a a semantic value. We define v a by: 

(a) 0 C-i, 1. 

(b) (uo, ui) E™xri (“0, ai) if Vi E™' at for f = 0,1. 

(c) delay(e) Esuspr « if e Er a. 

(d) C{v) E™' a if there is a' such that v a' and size(C(a')) < a|l 

o 

Our restriction on the form of (pc allows us to conclude that this definition is well-founded, even though 
the type gets bigger in clause (HU, because we can treat the definition of EF* as an inner induction on 
the values. Allowing datatype constructors to take function arguments complicates the situation, and in 
Section [S] we must define a more general relation. 
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(e) Xx.e ® if whenever v e[i;/x] Er a{a')- 

Theorem 4.2 (Bounding theorem). If e : t in the source language, then e Er [[||e||]l. 

Rather than proving this bounding theorem directly, in Section [5] we identify syntac¬ 
tic constraints on the complexity language which allow the proof to be carried through 
iTlieorem 15.7p . Because the size-based semantics satisfies these syntactic constraints (see 
Section EU, we can prove that the logical relation defined in Section [5] implies the one 
defined above, giving Theorem 14.21 as a corollary. 

5. The Syntactic Bounding Theorem 

Rather than proving the bounding theorem for a particular model, such as the one from the 
previous section, we use a syntactic judgement F b iiio <t El to axiomatize the properties 
that are necessary to prove the theorem. The rules are in Figure [3 we omit typing premises 
from the figure, but formally each rule has sufficient premises to make the two terms have 
the indicated type. The first two rules state reflexivity and transitivity. The next rule 
(congruence) says that term contexts of a certain form (in the sequel, congruence contexts) 
are monotonic. The next three rules state the monoid laws for C; we write Eq = Ei to 
abbreviate two rules Eq < Ei and Ei < Eq. The final three rules (which we call “step 
rules”) say that a /3-redex is bigger than or equal to its reduct. The first five congruence 
contexts are the standard head elimination contexts used in logical relations arguments 
(principal arguments of elimination forms) and the next two say that -|- is monotone. 

These preorder rules are sufficient to prove the bounding theorem, and permit a variety 
of interpretations and extensions. If we impose no further rules, then Eq < Ei is basically 
weak head reduction from Ei to Eq (plus the monoid laws for C). We can also add rules 
that identify elements of datatypes, in order to make those elements behave like sizes. For 
example, for lists of ints, we can say 


E < Cons{_, E) Cons{Ei, E) < Con5{E2, E) 

and extend the congruence contexts with Cons(x,C). Then the second rule equates any two 
lists with the same number of elements, quotienting them to natural numbers, and the first 
rule orders these natural numbers by the usual less-than. Thus, considered up to <, lists 
are lengths. 

Combining these rules with the ones used to prove the bounding theorem, the recursor 
for lists behaves like a monotonization of the original recursion (like the \/ in the size-based 
complexity semantics). For example, for any specific list value Cons(x, xs), by the usual step 
rule, we have 

Ei[{x, xs, rec(xs. Nil i-A Eq, Cons i-> p.Ei))/p] < rec(Cons(x, xs), Nil i-> Eq, Cons i-> p.Ei) 
But we can derive Nil < Cons(x,xs), so we also have 

rec(Nil,...) < rec(Cons(x, xs ),...) by congruence 

Eq < rec(Nil, Nil i-a Eq, Cons i-A p.Ei) by the step rule 

Eq < rec(Cons(x, xs). Nil i-> Eq, Cons P-Ei) by transitivity 

and similarly for non-empty lists that are < Cons(x,xs). Thus, when we quotient lists to 
their lengths, the congruence and step rules for rec (used to prove the bounding theorem) 
imply that the recursor is bigger than all of the branches for all smaller lists. This is in 
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C ::=[] \ ttqC \ TTiC \ C E \ rec(C, C i-A x.Ec) \ C + E \ E + C 


T^E<rE 


r h Eq < 2 ’ E\ r h E\ 'EiT E2 

r h Eq E2 


(transitivity) 


r,x:T' h C[x]:T T h Eq <t' Ei 
r^C[Eo] <tC[Ei] 


(congruence) 


r h 0 + -E =c E r h £■ + 0 =c E r h {Eq + Ei) + E2 =C Eq + (-El + E2) 


r h .Eo[-Ei/x] <T {Xx.Eq)Ei r h Ej <Ti 'Ki{Eo,Ei) 

C : ($ ^> A) € 'll 

r h £;c'[map^(?/.(?/, rec(?/,C' ha x.Ec)), Eq)/x] <t <rec^{CEo,C ^ x.Ec) 
Figure 7 . Congruence contexts and the preorder judgement 


IqH, 


contrast to the interpretation of the recursor-like construct given by iDanner et al 
which includes a explicit maximization that includes the base case. 

In Section HI we used reasoning in the size-based semantics to massage the recurrence 
extracted from a program into a recognizable and solvable form. In future work, we plan 
to investigate how to do this massaging within the syntax of complexity language, using 
the rules we have just discussed and others. For example, while a recursion bounds what 
it steps to on all smaller values, we do not yet have a rule stating that it is a least upper 
bound. Here, we lay a foundation for this by proving the bounding theorem for the small 
set of rules in Figure [7l 


5.1. The Bounding Relation. First, we extend Definition 14.II to arbitrary datatypes. Fix 
a signature We will mutually define the following relations in dehnition 15.11 

(1) e Et E, where 0 e : r and 0 |-||^|| E : ||r||. 

(2) V E™* E, where 0 n : r and 0 l-||^|| E : ((r)). 

(3) V E™}j E, where 0 n : 4>[6] and 0 l-||^|| E : (((/>))[<5]. 

(4) e E 0 ,r E, where 0 e : 4>[S] and 0 l-||^|| E : ||(/>||[(I] 

In ([3]) and Q, .R(0 v:6,(/i h||^|| E:6), is any relation; these parts interpret strictly positive 

functors as relation transformers. 

The definition is by induction on r and (/>. For datatypes, the signature well-formedness 
relation ip sig ensures that datatypes are ordered, where later ones can refer to earlier 
ones, but not vice versa. Therefore, we could “inline” all datatype declarations: rather than 
naming datatypes, we could replace each datatype name 5 by an inductive type ^[C of cp]. 
The logical relation is defined using the subterm ordering for this “inlined” syntax. In 
addition to the usual subterm ordering for types r and functors 0 , we have that datatypes 
that occur earlier in ip are smaller than later ones, and if C : {(p ^ 6) £ ip, then (p is smaller 
than 6. 

Definition 5.1 (Bounding relation). 

(1) We write e Er .E to mean: if e j,"" n, then 
• n < Ec] and 
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• V Ep. 

(2) We write v E to mean: 

• V E^it is always true. 

• {vi,V2) Erfxra ^ i® Erf TTq E and V2 Erf TTl E. 

• delay(e) Esfsp ^ ^ iff e Er .E- 

• V E™* E is inductively defined by 

C:{(^^d)e^p CE'<sE 

C V Ef^ E 


Xx.e E™L.t-„ E iff (for all ui and Ei, if vi Erf then e[i;i/x] Er 2 E Ei). 


—Ti->-r2 

(3) We write v E^jj Ep to mean: 

• V E™A E if R{v, E). 

• V Erfj if u E™^ E {t not free in r). 

• {v, v') Ejf ^ if ^1% vro E and v' Ejf^j vn E. 

• \x.ei Erfj .0 /j El if for all v and E^ if v E™^ E^ then ei[u/a:] E^,/? {E^ E). 

(4) We write e E(/),_r E to mean: if e v, then 

• n < Ec] and 

The inner inductive definition of v E™^ E makes sense because R occurs strictly positively 
in — E™/j —, and because (by signature formation) 5 cannot occur in (j), so — E™^ — does 
not occur elsewhere in — E^/j — • The relation on open terms considers all closed instances: 

(5) For a source substitution 6 : 7 and complexity substitution 0 : T, we write 9 E*“^ 0 
to mean that for all (x : t) € 7 , 9(x) E™^ 0(x). 

( 6 ) For 7 h e : r and F h FI: ||r||, we write e Er to mean that for all 0 :7 and 0 : F, if 

-sub 


9 0, then e[9] Er ^[0]- 


We write to mean that F is a derivation of any of the judgements just de¬ 

scribed. Because the relation for function types is a function between relations, derivations 
are infinitely-branching trees. A subderivation of such an £ is any subtree of £, which in¬ 
cludes any application of an —)-type judgement. For example, if £i :: Ax.ei n Ei and 

£ :: V E™* E, then the derivation of ei[i;/x] E™r FI is a subderivation of £ 1 . 

Next, we establish some basic properties of the relation: 


Lemma 5.2 (Weakening). 

(1) If e^r E and E <||t|| E' then e^r E'. 

(2) If V Ef^ E and E <( 7 )) E' then v Ef^ E'. 

Proof. We prove both clauses simultaneously by induction on r, using congruence for vro [], 
TTi [] and [] E. 

(1) Suppose e^r E and and E <cx({t» H'■ We need to show e Et E', so assume e v. 
Because e E we have that n <c Ec and v E™* Ep so it suffices to show Ec <c E'c 
and Ep <({t}) Ep. Recalling that —c and —p are really just ttq- and tti-, these are 
true using the congruence rule with x.ttqx and x.Tiix on E <cx{(r)> H'■ 
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(2) Case: ti xt 2 . By the induction hypotheses, it suffices to show that the assumption 
E <({ri))x{{r 2 » E' implies ttqE <t-^ ttoE' and similarly for vri. Apply the congruence 
rule with x.ttqx. 

Case: suspr. Immediate by the induction hypothesis ([T]). 

Case: ti —)• r 2 . Using the induction hypothesis ([I| on r 2 , it suffices to show that 
the assumption E E' implies E E^ <\\t 2 \\ E' E^ . Use the congruence rule 

with /./ El. 

Case: 6. Because weakening is built into the dehnition, this is immediate by tran¬ 
sitivity. 

□ 


Lemma 5.3 (Compositionality). 

(1) e E iff e E. 

(2) V E iffv E. 

Proof. (1) Post-compose with part (j2|). 

(2) By induction on cj): 

Case: f = t. t[r] = r, so we need to show that v E iS v E, which is 

true by definition. 

Case: t not free in cp. We need to show v E v E, which is true 

E, !=t- 

by dehnition. 


Case: f = fo x fi. 

I—val p 

^ -<^0X01, ^ 


iff V = 


V = {vo,vi) where uq E^Lcvai. t^qE and 
vi E^?J_^vai_ tti E (by dehnition) 

Vo vro E and vi vri E (by IH) 
E 0 tr]x 0 '[-r] ^ (by dehnition) 

E (by dehnition). 


iff 

iff u C 

iff V 

—<hxq 


Case: f = t 


V c 


val 


—t^0o,-e;p'- 


4>o- 

El 


iff V is Xx.ei where for all v E, 

vi[v/x\ E 0 (,,_izvai_ {El E) (by dehnition) 
for all V E, ei[v/x\ ^cI,oIt] {Ei E) (by IH ([U) 
Xx.ei Er!!, 0 o[r] (by dehnition). 

V El (by dehnition). 


iff 

iff 

iff 


□ 


Lemma 5.4. If Vi Ei for i = 0,1, then {vo,vi) {Eo,Ei). 

Proof. We need to show that Vi Etqxti '^i{Eo,Ei). By the step rule for pairs we have that 
Ei < TTi{Eo, El), and so by weakening it suffices to show that Vi Ei, which is given. Q 
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5.2. The Fundamental Theorem. First we state two lemmas which say that, when ap¬ 
plied to related arguments, source-language map is bounded by complexity-language map, 
and that source-language rec is bounded by complexity-language rec. 

Lemma 5.5 (Map). Suppose: 

(1) X : To b ui : Ti and 0 h uq : <?i[To]; 

(2) x: {{to)) h El : ((n)) and 0 b FIq : ((<(')) [((tq))]; 

(3) S :: VO Er-cv.i_ Eo; 

(4) Whenever £' is a subderivation of£ such thatS' :: Vq C™' Eq, vi[vo/x\ Ei[E'q/x\; 
and 

(5) mapb(x.i;i,To) J,"" v. 

Then n = 0 and v map^^‘^'^'^{x.Ei,Eo). 0 

Proof. The proof is by induction on (p. Lemma 12.41 shows that n = 0. 

Case: p = t. Then map'^(x.i;i, uq) v implies that vi[vo/x\ v, and map'^{x.Ei, Eq) = 
Ei[Eo/x]. By ([3]), Vq Eq, and so by definition vq C™' Eq. Hence by ([3]), u = 

vi[vq/x] E™' Ei[Eo/x\. 

Case: p = t (t ^ fvr). This follows directly from the assumptions and dehnitions. 

Case: p = Pq x pi. Then vq = {voo,voi) and by inversion we have 

mapbo(x.ui,uoo) 4-”° wo iJiap*^!(x.ui,uoi) wi 
map'^(x.ui, (fooiXoi)) 

We also have £’-subderivations £oi ■■ i-vai '^i(Eo)- Any subderivation of £”01 is a 

<?>i. Gt-q 

subderivation of £, and so the induction hypothesis applies to Vm and TiEo, from which we 
conclude that Wi <r\ap^^^'^^\x.Ei,TriEo). Thus we have that 

V = {wo,wi) (map^^‘^°^^(x.£'i,7ro£'o), map^^'^i^^(x.£'i,7riFlo)) fLemma lOl) 

= map^<<^°^^^'>\x.Ei,Eo) 

= map^^'^^^(x.Fli,£'o)- 

Case: p = t ^ Po. Then vq = Xy.eo and £ proves that for all v' E™* E', eo[x'/y] E(/,Q,_izvai_ 

Eo{E'). Since vq = Xy.eo, v = Xy.let{eQ,z.:aa.p^{x.vi,z)), and so we must show that 
Xy. let(eo, z.ma.p'^{x.vi, z)) {x.Ei,Eo). To do so, suppose w Er*^^ E] we 

must show that 

let(eo[ic/y], 2 :.map'^(x.xi,z)) E,/,o[ro] map^^‘l’°^x.El, Eo{E)). (*) 

Suppose 

^o[w/y] Wo map'^°(x.xi,tco) v' 
let{eo[w/y],z.ma.p‘t’°{x.vi,z)) 4.^0+^^ v' 

^We could have said map‘^(x.ui, uo) (0, £ 0 )) but this version of the lemma avoids 

needing the symmetric copy of the step rule for pairs. 
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Since w F, we have that E derives eo[w/y] E 0 o,_cvai_ Eo{F) and hence we have a 
subderivation £q of £ such that £q :: rco {Eo{F))p. We now verify that (0]) holds 

for £q so that we can apply the induction hypothesis to to map‘^(x.ui,rco)- So suppose that 
£q is a subderivation of £q such that £q :: Wq C™' Fq. We need to show that vi[w'q/x\ 
Fi[Fg/x], and to do so it suffices to note that £q is a subderivation of Eq, which in turn is a 
subderivation of £. 

We can now apply the induction hypothesis to conclude that ni = 0 and so: 

no + ni = no < (FoF)c =(mapll‘^ll (x.Fi, Fo F))c 

v' map<W>(x.Fi,(FoF)p) =(mapll'^ll(x.Fi,FoF))p. 

Using /3 for pairs, these are the two conditions that must be verified to show (*), so this 
completes the proof. D 

Lemma 5.6 (Recursor). Fix a datatype declaration datatype <5 = C of cj). If v E and 
for all C, ec E 0 c[< 5 xsuspr] Ec, then rec{v,C x.ec) E rec(F,C' i-A x.l +c Ec) 

Proof. By induction on v E. The only case is 

C E'<s E 

C v' E (t) 

Assume rec(C' v', C e->■ x.ec) evaluates. Then by inversion and Lemma 12.31 it was by 
Cv' 1 ° Cv' map'^(^.(^,delay(rec(^,C i-A x.ec))),v') v" ec[v"/x] v 

rec(CF,C^x.ec) ^ 

Using the premise that CE' <s E from (f), j3 for datatypes, and congruence, we note that 

rec(F, C i-A x.l +c Ec) > rec{C E',C t-A x.l +c Ec) 

> 1 +c Fc[map<<'^>>(y.(y, rec(y, C i-A x.l +c Ec)),E')/x] 

Let us write E* for map''^'^^^ (i/.(i/, rec(y, C i—>■ x.l +c Ec)),E'). Thus by congruence, transi¬ 
tivity, weakening, and fd for pairs, it suffices to show 

1 + n 2 < 1 + Ec\E*/x\c 

V C™' {Ec[E*/x])p 

By congruence for -|-, for the first goal it suffices to show n 2 < Fc[F*/x]c. Thus, if we 
can show ec[u'yx] E Fc[F*/x], then applying it to the third evaluation premise of (*) gives 
the result. We can use our assumption that ec E Ec, as long as we show v" E™^ E*. To 
do so, we use Lemma 15.51 applied to the second evaluation premise of (*) with 

vi = v' V = y.{y, delay(rec(i/, C !->■ x.ec))) 

Ei = E' E = y.{y,rec{y,C x.l+c Ec)) 

We have £ :: v' E™^ .-vai E' from the second premise of (|). Thus, to finish calling the 
theorem, we need to show that for all F-position subderivations of £ deriving v'^ E™^ F(, 


(xi,delay(rec(xi,C'hA x.ec))) E^xsuspr (F), rec(F(, C eA x.l-Fc Fc)) 
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By definition of value bounding at product types, weakening and (3 for pairs, it suffices to 
show 

|—val 171/ 

delay(rec(n'i,C'i-A x.ec)) QltspT rec{E[,C ^ x.lEc) 

The former we have, and for the latter by definition it suffices to show 
rec(n^, C i-A x.ec) Et 'rec{E[,C i-A x.l +c Ec) 

Because v'^ E'^ is an i?-subderivation of v' E™*_|-vai_ E', and therefore a strict sub¬ 

derivation of C v' E, we can use the inductive hypothesis on it, which gives exactly 
what we needed to show. □ 


Theorem 5.7 (Bounding Theorem). // 7 h e : r, then e Er ||e||. 

Proof. The proof is by induction on the derivation of 7 h e : r. fn each case we state the 
last line of the derivation, taking as given the premises of the typing rules in Figure [2j 
Case: 7 ,x : r I- x : r. By definition ||x||[0] = (0, x). x[6\ = v and (0, x)[0] = {0,E) where 
by assumption v E™^ E. We must show that v Et {0,E). Assume v v'. Then by 

inversion fLemma I2.3p n < 0 and v' = v. Thus, by transitivity and (3 for pairs, n < {^,E)c 
and by weakening and (3 for pairs v E™* (0, E)^. 

Case: 7 h (cq, ei) : tq x ti. Expanding the definitions, we need to show 

{eo[0],ei[9]) E ((Eo)c + (Ei)„ ((Eo)p, (Ei)^)) 

where Eq = ||eo||[0] and Ei = ||ei||[0]. By the fH, eo[0] Etq Eq and ei[6] En Ei. 

Suppose 

eo[0]]f^°VQ vi 

(eo[ 0 ],ei[ 0 ])ro+-i (no,ni) 

By the fH we have that ei[6] E Ei and hence < (Ej)c and Vi E™^ {Ei)p for f = 0,1. Thus 
we conclude that 

no + ni < Eoc + Ei^ {vo,vi) E™' {{Eo)p, {Ei)p) 

and the result follows by weakening and (3 for pairs. 

Case: 7 h split(eo,xo.xi.ei) : r. Expanding definitions, we need to show 

split(eo[6'],xo.xi.ei[6»,xo/xo,xi/xi]) E {Eq)c+cEi = {{Eq)c + {Ei)c,{Ei)p) 

where Eo = ||eo||[0] and Ei = ||ei||[ 0 , 7 ro (Ei)p/xo, 7 ri (Ei)p/xi]. 

Suppose 

eo[6>] (no,ni) ei[6 *,xq/xq, m/xi] v 
split(eo[6'],xo.xi.ei) |^o+ni ^ 

We apply the induction hypothesis as follows: 

( 1 ) From eo[0] E Eq: 

(a) no < (Eo)c; 

(b) (no,ni) E™^ {Eo)p and hence Vi E™^ 'Tj((Eo)p) for z = 0 , 1 . 

( 2 ) From ei E ||ei|l, 9 E™^ 0, and Vi E™^ Tj((Eo)p) for z = 0,1, 

(a) e,vo/xo,vi/xi E''“'' 0 , 7 ro((Eo)p)/xo, 7 ri((Ei)p)/xi, and hence 

(b) ni < (Ei)c; 

(c) V E™^ {Ei)p. 
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Thus we conclude that 

no + ni<{EQ)c + {Ei)c v E™'(-^i)p 
and the result follows by monotoncity of +, weakening, and j3 for pairs. 

Case: 7 h \x.e : a ^ t. Expanding the definitions, 

{Xx.e)[6] = Aa;.e[0, x/x] 

||Ax.e||[0] = (0, Ax.||e||[0, x/x]) 

Assume Xx.e[9,x/x] evaluates. By inversion we have Xx.e[9,x/x] 4-*^ Ax.e. Apply¬ 
ing transitivity/weakening and f3 for pairs we need to show that 0 < 0 (trivial) and 
Xx.e[9,x/x] Ax.||e||[0, x/x]. Assume vi E™' Ei] we need to show e[9, x/x\[vi/x] Er 

(Ax.|| e|| [0, x/x]) . By weakening, j3 for functions, and Lemmas 12.21 and 13.11 it suf¬ 

fices to show e[9^vi/x\ Et l|e|| [0, Ei/x]. Because 9 E*“^ 0 and vi E™' Ei, we have 
9,vilx E®“^ QjEi/x, so the IH gives the result. 

Case: 7 L ep er:r. By dehnition, [eg ei)[9] = eo[9\ ei[9] and \\eo er||[0] = {{Eo)c + {Ei)c + 
Ec,Ep) where Ei = ||ei||[0] for f = 0, 1 and E = {Eo)p {Ei)p. Suppose that 

eo[9] 4,"’° Xx.e'g ei[9] 4,"’i vi Cgfui/x] v 
eo[9] ei [9] v 

We have the following facts from the induction hypothesis: (1) From eo[0] E Eq: (a) no < 
(£'o)c and (b) Xx.e'g {Eo)p] ( 2 ) From ei[9] E Ey. (a) ni < (£'i)c and (b) vi (-Ei)p; 
(3) From (lb) and (2b) and the dehnition of E™*, &'q[vi/x\ E E, so (a) n < Ec and 
(b) V E™^ Ep. Thus we conclude 

no -|- ni -|- n < [Eq)c {Ei)c + Ec v E Ep 
and the result follows from weakening and (3 for pairs. 

Case: 7 I- delay(e) : suspr. Expanding dehnitions, we need to show delay(e[0]) E 
(0, ||e|| [0]), so suppose delay(e[0]) delay(e[0]). We have 0 < (0,.. .)p by /3 for pairs. 
For the potential goal, we must show that delay(e[0]) Eguspr ll®ll[®]' ^y dehnition, this 
means showing e\9] Er ||e||[0], which is exactly the IH. The result follows from weakening 
and (3 for pairs. 

Case: 7 h force(e) : r. Expanding dehnitions, we need to show force(e[0]) Er 
{Ec + {Ep)c, {Ep)p) where E = ||e||[0]. Suppose 

e[9] delay(e') e' v 
force(e[ 0 ]) y 

Since e[9] E E, we have that no < Ec and delay(e') Ep. From the dehnition of E™\ 
e' E Ep, and hence n' < {Ep)c and v E™* (-E'p)p- The result follows from monotonicity of -|- 
and j3 for pairs. 

Case: 7 h Ce : 5. We must show that Ce[9] E {Ec,C{Ep)), where E = ||e||[0]. Suppose 

e[9] I" V 
C e[9] r C V. 

Since e[9] ^ E, n < Ec (satisfying the cost goal) and v E^l^] Ep. By Lemma 15.31 v E™*_|-vai_ 
Ep. Since C{Ep) < C{Ep) by rehexivity, we have that Cv E™' C{Ep) by dehnition of E™*- 
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Case: 7 h rec(e, C 1 —>■ x.ec) '■ t. We need to show 

rec(e[ 6 '],C' i-A x.ec[0,x/x\) C + {Er)c, {Er)p) 
where E = ||e||[0] and Er = rec{Ep,C i-A x.(l +c HecH[©, a:/x])). Suppose 

e[ 6 >] Cvq map'^cdelay(rec(?/,C' i-A x.ec[6.x/x]))), vq) vi ec-[ 6 >,x/i;i] v 

rec(e[0],C' i-A x.ec[0x/x\) \^+'^o+n 2 y 

By the induction hypothesis e\6] C E, so no < E^ and Cvq Ep. By Lemma [2.31 we 
can derive 


Cvq Cvq delay(rec(y, C i-a x.ec[0,x/x]))),vo) vi ec[0-,x/vi] v 

rec{Cvo, C i-A x.ec[0,x/x\) y 

So by Lemma [ 5 . 6 1 we have that 1 + 77-2 < (-Er)c S'Rd v {Er)p- Putting these together, we 
have what we needed to show: 


1 + U-o + 772 ^ E(^ + (^Ej-'jc V (^Ej-)p 

Case: 7 h map'^(x.7;i,7;o) : Because vi is a sub-syntactic-class of e, we can upcast it 

and apply || 7 ;i|| to it, producing a complexity expression. We must show that 


map'^(x.i;i[ 6 »,x/T], 7 ;o[ 6 ']) C ( 0 , (x-Hui||[ 0 , x/x]p, Hvoll[ 0 ]p)), 

so suppose niap‘^(x.7;i[0,x/x],7;o[0]) 4-” By transitivity/weakening with /3 for pairs, it 
suffices to show: 

77 < 0 X E™'map<<‘^>>(||7;i||[0,x/x]p, ||xo||[0]p) (*) 

We will apply Lemma 15.51 with 

= xi = xi[6»,x/x] 


^^0 = ||t’o||[0]p Lii = ||xi||[0,x/x]p 

To establish condition Q we apply the IH to vq to conclude that vo[0\ E(^[ro] ll'f^o||[0]- 
Since xo[0] is a value, by Lemma [T3l it evaluates to itself. Therefore xo[0] lbo||[0]p 

and so by Lemma [T3l Xo[0] E^'^Lrvam 

—rn 


V’ 


To establish condition Q, assume Uq C™ -Eq (which is an i2-subderivation of the above, 
but we won’t use this fact). Using the substitution lemmas we need to show xi[0,Xq/x] E™^ 
||xi||[0,L1q/x]p. Since 0 ,Xq/x E™^ Gli^o/x, the IH on xi gives xi[0,Xq/x] E ll^^iII[0; 
and since xi[ 0 ,Xq/x] is a value, it evaluates to itself, so xi[ 0 ,Xq/x] E™* ll^iII[ 0 i-E^ o/^Ip 
needed to show. 

Now we apply Lemma [531 to map'^(x.xi[0, x/x], xo[0]) \E v to conclude (*). 

Case: 7 h let(eo,x.ei) : r. Applying the substitution lemmas, we need to show 


let(eo[6'],x.ei[6',x/x]) E {Eqc + ||ei||[0,L;op/a;]^, ||ei||[0,Llop/a^lp) 
where Eq = ||eo||[0]. 

Assume let evaluates, then by inversion and applying the substitution lemma, 

eo[9]i^°vo ei[6*,xo/x] xi 
let(eo[0],x.ei[0,x/x]) vi 
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Applying the IH to eo gives no < -Eqc vq E^p. Therefore 6,vqIx Q,Eop/x so 
applying the IH to the evaluation of ei\9,VQ/x\ gives 

ni < ||ei||[0,Sop/a:]c vi ||ei||[0, £^op/a;]p. 

Monotonicity of + gives no + ni < Eq^, + ||ei || [0, E'op/xjc so transitivity/weakening and /3 
for pairs gives the results. D 


6. Models of the Complexity Language 

A model of the complexity language consists of an interpretation of types as preorders, and 
of terms as maps between elements of those preorders, validating the rules of Figure [7l The 
congruence contexts C, but not all terms, need to be monotone maps. 

6.1. The Size-Based Complexity Semantics. We showed in Section U] that the size- 
based semantics interpets the syntax of the complexity language; it is also a model of the 
preorder rules of Figure [71 Congruence is established by induction on C; we do not need 
programmer-defined size functions to be monotonic, because there is no congruence context 
for datatype constructors. The step rule for the recursor is verified as follows: 

lrBc{CEQ,x^ EcM 

= y case{z,{...,fc,...)) 

size 2<[Ci?ol5 

= y case{z,{... Jc,.. ■)) 

size z<size{ClEo'^^) 

> case{ClEoj^, (...,/c, • • •)) 

= lEcU{x ^ lmap'^^{w.{w,rec{w,x i-A Lie)), .Eo)K}- 
Therefore, Theorem 14.21 is a corollary of Theorem 15.71 

6.2. Infinite-Width Trees. Infinite-width trees can be defined by a datatype declaration 
with a function argument, such as 

datatype tree = E of unit | N of int x (nat —)• tree) 

Though every branch in such a tree is of finite length, the height of a tree is in general not a 
finite natural numberU However, the size-based semantics adapts easily to interpret tree by 
a suitably large infinite successor ordinal, and then defining size{N{x, /)) = Vpe[nat] fiv) + 

1 . 


^Because we can only construct values using rec, we cannot define infinite-length branches (i.e., 
coinductively-defined data) in our source language. 
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6.3. A Semantics Without Arbitrary Maximums. The language studied in iDanner et al 


2013l | can be viewed as a specific signature in the present language. Their language has a 
type of booleans, a type int of fixed-size integers, and a type list of integer lists. As in 
Example 14. 2 [ we can treat int and bool as enumerated datatypes with unit-cost operations. 
The list type is defined as a datatype and its case and fold operators are easily defined 
using rec. 

For this specific signature, we can give a semantics of the complexity language that does 
not require arbitrary maximums in the semantics of each type, and where we interpret list 
by N, the natural numbers. Set |Nil]]^ = 0 and |Cons(ii'o)= [-®iM + 1- Define a 
semantic primitive recursion operator rec^ : N x a x (N x ct —cr) —a by 

rec(0, a, f) = a rec{n -|- 1, o, /) = o V /(n, rec{n, a, /)). 


Finally, set 

|rec(F;)]l^ = recilEj^, lENiil^, An, in.lFlconslCIa:, xs, r l,n,w}). 

where \rec{E) = rec(Fi, Nil i-A Smu, Cons i-A (x, (xs, r)).ii'cons)- Verifying the preorder rules 
from Figure [3 is straightforward in all cases except the last, which we verify as follows: 


[rec(Nil)]]^ = I-E^Niil?{x 1} 

= lEm[{)/xm 

= |F;Nii[map“"'^(i/.(i/,rec(2/)), ())/x]lC 


|rec(Cons(^o,i^i))le = (li^NiiMIx 1})V 

(I-E^ConsM{x,xs,r hA 1, recilEij ^,...)}) 

> [^^Consl?{x,xs,r i-A 1, iFli]]^, rec (iFlilC,...)} 

= lEc ons [Eq, {El, rec{Ei))/x, (xs,r)]]l| 

= [^^Cons[^o,map(i/.(i/, rec(?/)), {Eo,Ei))/x,xs,r]jC. 

A natural question is why we must take rec(n-|- 1, a, /) = aV/(rec(n, o, /)), since the above 
proof seems to carry through with rec(n -t- 1, a, f) = /(rec(n, a, /)). The problem is that if 
we use this latter definition, then the resulting interpretation fails to satisfy the congruence 
axiom for contexts of the form rec([],...). 


6.4. Exact Costs. If we wish to reason about exact costs, we can symmetrize the inequali¬ 
ties in FigureHinto equalities, and add congruence for all contexts, which makes the Eq < Ei 
judgement into a standard notion of definitional equality. Then we can take the term model 
in the usual way, interpreting each type as a set of terms quotiented by this definitional 
equality. The preorder judgement is interpreted as equality. In this interpretation ||e||^ is a 
recurrence that gives the exact cost of evaluating e, but reasoning about such a recurrence 
involves reasoning about all of the details of the program. 
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6.5. Infinite Costs. Next, we consider a size-based model in which we drop the “increasing” 
requirement on the size functions from Section HI Rather than requiring a well-founded 
partial order for each datatype, we require an arbitrary partial order (S'”, <,-) which we also 
interpret as a flat CPO (we do not require the interpretation of non-datatypes to be CPOs). 
The interpretation of rec expressions is then in terms of a general fixpoint operator. Define 
oo = V and identify oo with the bottom element of the CPO ordering. In this setting 
it may be that the interpretation of a rec expression does not terminate and hence, by our 
identification, evaluates to oo. This turns out to be exactly the right behavior, as we can 
see in the following example. 

Take the standard inductive definition of nat and interpret nat as some one-element 
set {1} in the complexity language, so size^at is a constant function—that is, declare that 
all nat values have the same size. Now compute the interpretation of the identity function: 

|||rec(y,Zero i—>■ Zero,Succ i—x.Succx)||]] 

= rec(l, Zero i-a (0,1) | Succ i-a (x, r).(l -|- rc, 1)) 

= \J case(t. Zero !->■ (0,1) I Succ i-> (x, r).l-b ec(x)) 

size z<l 

where 

e(x) = rec(x, Zero i-> (0,1) | Succ i-A (x, r).(l -b Tc, 1)) 

Since size(Succ(l)) = 1 < 1, one of the case expressions in the maximum is ec(l). In 
other words, we have a non-terminating recursion in computing the complexity. We conclude 
|||rec(.. . )||cl = oo; in other words, we can draw no useful conclusion about the cost of this 
expression. This a feature of our approach rather than a bug. What we have done in this 
example is to declare that we cannot distinguish values of type nat by size (they all have 
the same size), and then we attempt to compute the cost of a recursive function on nats 
in terms of the size of the recursion argument. The bounding theorem still applies in this 
setting, and hence the interpretation gives us a bound on the cost of the computation. In 
this case, the bound is just not a useful one; it does not even tell us that the computation 
terminates. 


7. Related Work 


There is a reasonably extensive literature over the last several decades on (semi-)automatically 
constructi ng resourc e boun ds from source code. The first work concerns itself with first-order 
programs. Weebreitl 19751 describes a system for analyzing simple Lisp programs that pro¬ 
duces closed forms that bound running time. An interesting aspect of this system is that 
it is possible to describe probability d istributions on th e input domain and the generated 
bounds incorporate this information. iRosendahll |l989l | proposes a system based on step¬ 


counting functions and abstra ct interpretat i on for a first-order subset of Lisp. More recently 
the COSTA project (see, e.g., Albert et al.l 2012l |) has focused on automatically computing 
cost relations for imp erative language s (actu ally, bytecode) and solving them (more on that 
in t he next sectigni. IPebrav and Lin 19931 develop a system for analyzing logic programs 
and iNavas et al.l |2007l | extend it to handle user-defined resources. 

The Resource Aware ML project (RAML) t akes a different a. pproach to the one we have 
described here, one based on type assignment, dost et al. |2ninl | describe a formalism that 
automatically infers linear resource bounds for higher -order programs, provided that the 
input program does in fact have a linear resource cost. Hoffmann and Hofmann |2ninl | and 
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Hoffmann et al.l 2012ll extend this work to han dle polynomial bounds, though for first-order 


programs only, and lHoffmann and Shaol |2015l | extend it to parallel programs. RAML uses a 
source language that is similar to ours, but in which the types are annotated with variables 
corresponding to resource usage. Type inference in the annotated system comes down to 
solving a set of constraints among these variables. A very nice feature of this work is that 
it handles cases in which amortized analysis is typically employed to establish tight bounds, 
whi le our approach ca n only conclude (worst-case) bounds. 

IPanielsson 20031 uses an annotated monad (similar to C x —, but dependent on the 
cost) to track running time in a dependently typed language, where size reasoning can be 
done via types. He emphasizes reasoning about amortized cost of lazy programs. However, 
he relies on explicit annotation of the program, which our complexity translation inserts 
automatically, and his correctness theorem is only for closed programs, whereas we use a 
logical relation to validate extracted recurrences. 

We now turn to work that is clo sest in spirit t o ours , focusing on those aspects related to 
analysis of higher-order languages. Le Metayer ’s 1988| ACE system is a two-stage system 
that first converts FP programs Backusl . 19781 to recursive FP programs describing the 
number of recursive calls of the source program, then attempts to trans form the result 
using various program-transformation techniques to obtain a closed form. ShultisI 19851 
defines a denotational semantics for a simple higher-order language that models both the 
value and the cost of an expression. As a part of the cost model, he develops a system 
of “tolls,” which play a role similar to the potentials we define in our work. The tolls and 
the semantics are not u sed directly in calculations, but rather as components in a logic for 
reasoning about them. Sands (l990 | puts forward a translation scheme in which programs 
in a source language are translated into programs in the same language that incorporate 
cost information; several source languages are discussed, including a higher-order call-by- 
value language. Each identifier / in the source language is associated to a cost closure that 
incorporates information about the value / takes on its arguments; the cost of applying / to 
arguments; and arity. Cost closures are intended to address the same issue our higher-type 
potentials do: re cording information about the future cost of a partially-applied function. 


Van Stonel |2003l | annotates the operational semantics for a higher-order language with cost 
information. She then defines a category-theoretic denotational semantics that uses “cost 
structures” to capture cost in formation and shows that the latter is sound with respect to 
the former. Benzinger 2004l | annotates NuPRL’s call-by-name operational semantics with 
complexity estimates. The language for the annotations is left somewhat open so as to 
allow greater flexibility. The analysis of the costs is then completed using a combination of 
NuPRL’s proof generation and Mathematica. In all of these approaches the cost domain 
incorporates information about values in the source language so as to provide exact costs. 
Our approach provides a uniform framework that can be more or less precise about the source 
language values that are represented. While we can implement a version that handles exact 
costs, we can also implement a version in which we focus just on upper bounds, which we 
might hope leads to simpler recurrences. 


8. Conclusions and Further Work 

We have described a denotational complexity analysis for a higher-order language with a 
general form of inductive datatypes that yields an upper bound on the cost of any well-typed 
program in terms of the size of the input. The two steps are to translate each source-language 
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program e into a program ||e|| in a complexity language, which makes costs explicit, and then 
to abstract values to sizes. We prove a bounding theorem for the translation, a consequence 
of which is that the cost component of ||e|| is an upper bound on the evaluation cost of e. 
The proof the bounding theorem is purely syntactic, and therefore applies in all models 
of the complexity language. By varying the semantics of the complexity language (and in 
particular, the notion of size), we can perform analyses at different levels of granularity. We 
give several different choices for the notion of size, but ultimately this is too important a 
decision to take out of the hands of the user through automation. 

The complexity translation of Section [3] can easily be adapted to other cost models. For 
example, we could charge different amounts for different steps. Or, we could analyze the 
work and span of parallel programs by taking C to be series-parallel cost graphs, something 
we plan to investigate in future work. 

Another direction for future work is to handle different evaluation strategies. Compo- 
sitionality is a t horny issue wh en considering call-by-need evaluation and lazy datatypes, 
and as noted by Okasaki ll998L it may be that amor tized cost is at le ast as interesting as 
worst-case cost. SandsI 1990l |. Van Stone 2003 1. and Danielsson 2003l | address laziness in 
their work, and as we already noted, RAML already performs amortized analyses. 

We plan to extend the source language to handle general recursion. Part of the dif¬ 
ficulty here is that the bounding relation presupposes termination of the source program 
(so that the derivation of e J,”' u, and hence cost, is well-defined). One approach would be 
to require the user to supply a proof of termination of the program to be analyzed. Or, 
one coul d define the ope r ation al semantics of the source language co-inductively (as done 
by, e.g., Lerov and Grail 2009l |f . thereby allowing explicitly for non-terminating computa- 
tions. Another approa ch is to adapt the partial big-step operational semantics described by 


Hoffmann et al.l |2012l |. Since our source language supports inductive datatype definitions 
of the form datatype strm = Cons of unit —nat x strm, adding general recursion will 
force us to understand how our complexity semantics plays out in the presence of what are 
essentially coinductively defined values. One could also hope to prove termination in the 
source language by first extracting complexity bounds and then proving that these bounds 
in fact define total functions. Another interesting idea along these lines would be to define 
a complexity semantics in which the cost domain is two-valued, with one value representing 
termination and the other non-termination (or maybe more accurately, known termination 
and not-known-termination); such an approach might be akin to an abstract interpretation 
based approach for termination analysis. 

Th e programs ||e|| a re complex higher-order r ecurrences that call out fo r solution tech- 


Benzing^ 2r)04l | addresses this idea, as do Albert et al. 2011 . 2013 1 of the COSTA 


niques. 

project. Another relevant aspect of the COSTA work is that their cost relations use non¬ 
determinism; it would be very interesting to see if we could employ a similar approach instead 
of the maximization operators that we used in our examples. Ultimately we should have a 
library of tactics for transforming the recurrences produced by the translation function to 
closed (possibly asymptotic) forms when possible. 
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